Clustering-Based Relevance Feedback for Web Pages

نویسندگان

  • Seung Yeol Yoo
  • Achim G. Hoffmann
چکیده

Most of traditional relevance feedback systems simply choose top ranked Web pages for a query as the source of providing the weights of candidate query expansion terms for the query. However, the whole contents of such top-ranked Web pages are usually mixed with sub-topically distinguishable contents that are too heterogeneous to be directly used to extract good quality candidate query expansion terms. In this paper, our basic idea is that the Web pages properly clustered into a sub-topic cluster can be used as a better source rather than whole given Web pages, to provide more topically coherent relevance feedback for that specific sub-topic. Thus we proposed Clustering-Based Relevance Feedback for Web Pages, which utilizes three methods to cluster given or retrieved Web pages into several subtopic-clusters. These three methods cooperates to construct good quality clusters by respectively supporting Web pages Segmentation, Terms (or Features) Selection, k Seed Centroids Selection. Here the automatically selected terms indicate the relevance feedback to construct all sub-topic clusters and assign the given Web pages into proper clusters. Each subset of the selected terms, which occurs in the Web pages assigned into a sub-topic cluster, indicates the relevance feedback to expand a query over that sub-topic cluster. Our experimental results showed that the clustering performances based on two traditional term-weighting methods (i.e., an unsupervised method and a supervised method) were significantly improved with our methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Web pages ranking algorithm based on reinforcement learning and user feedback

The main challenge of a search engine is ranking web documents to provide the best response to a user`s query. Despite the huge number of the extracted results for user`s query, only a small number of the first results are examined by users; therefore, the insertion of the related results in the first ranks is of great importance. In this paper, a ranking algorithm based on the reinforcement le...

متن کامل

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Latent Semantic Space for Web Clustering

To organize a huge amount of Web pages into topics, according to their relevance, is the efficient and effective method for information retrieval. Latent Semantic Space (LSS) naturally in the form on some geometric structure in Combinatorial Topology has been proposed for unstructured document clustering. Given a set of Web pages, the set of associations among frequently co-occurring terms in t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006